withColumn

Spark withColumn() function is used to add new column, rename, change the value, convert the datatype of an existing DataFrame.

val simpleData = Seq(("James", "Sales", 3000),("Michael", "Sales", 4600),("Robert", "Sales", 100),("Maria", "Finance", 3000),("James", "Sales", 3000),("Scott", "Finance", 3300),("Jen", "Finance", 3900),("Jeff", "Marketing", 3000),("Kumar", "Marketing", 2000),("Saif", "Sales", 4100))

val df = simpleData.toDF("employee_name", "department", "salary")
df.select("employee_name","department","salary").show

Add new columns to DataFrame
Add new column total_sal (adding 5000 in existing salary)

df.withColumn("total_sal",col("salary") + 5000).show(5)


change the value of existing columns in DataFrame
Add 5000 to existing salary column

Values Before change


Values After change
df.withColumn("salary",col("salary") + 5000).show(5)
 
change the column datatype 
df.withColumn("salary",col("salary").cast("Integer")).show(5)



rename column(withColumnRenamed)
Values Before change

Values After change
df.withColumnRenamed("employee_name","empName").printSchema

rename multiple columns (withColumnRenamed)
df.withColumnRenamed("employee_name","empName")
   .withColumnRenamed("department","dept").printSchema

Explode columns (withColumnRenamed)

 df.withColumn("b",explode($"b")).show










No comments:

Post a Comment